Minimization for Generalized Boolean Formulas * 



Edith Hemaspaandra 
Rochester Institute of Technology, 
Rochester, NY, USA 

Henning Schnoor 
Christian-Albrechts-Universitat zu Kiel, 
Kiel, Germany 



Abstract 

The minimization problem for prepositional formulas is an important optimiza- 
tion problem in the second level of the polynomial hierarchy. In general, the prob- 
lem is Ej -complete under Turing reductions, but restricted versions are tractable. 
We study the complexity of minimization for formulas in two established frame- 
works for restricted propositional logic: The Post framework allowing arbitrarily 
nested formulas over a set of Boolean connectors, and the constraint setting, al- 
lowing generalizations of CNF formulas. In the Post case, we obtain a dichotomy 
result: Minimization is solvable in polynomial time or coNP-hard. This result also 
applies to Boolean circuits. For CNF formulas, we obtain new minimization algo- 
rithms for a large class of formulas, and give strong evidence that we have covered 
all polynomial-time cases. 



1 Introduction 

The minimization problem for propositional formulas is one of the most natural opti- 
mization problems in the polynomial hierarchy. In fact, a variant of this problem was 
a major motivation for the definition of the polynomial hierarchy [,MS72J . The goal of 
minimization is to find a minimum equivalent formula to a given input formula. In this 
paper, we study the minimum equivalent expression (MEE) problem, where the input 
is a formula ip and a number k, and the question is to determine whether there exists a 
formula which is equivalent to ip and of size at most k (we study different notions of 
"size"). 

The problem is trivially in but a better lower bound than coNP-hardness had 
been open for many years. In OHW02I , Hemaspaandra and Wechsung proved the prob- 
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lem to be (many-one) hard for parallel access to NP. Recently, it was shown to be 
S2 -complete under Turing reductions by Buchfuhrer and Umans F BUl 111 . 

Minimization in restricted fragments of propositional logic has been studied for 
the case of Horn formulas in order to find small representations of knowledge 
bases F HK95L Prime implicates, a central tool for minimizing Boolean formu- 
las |Qui 52), have b een used in several areas of artificial intelligence research. We 
mention f ACG"'"06l . where prime implicates were used in peer-to-peer data manage- 
ment systems for the semantic web, and |Bit08|, which applies them in the context 
of belief change operators. Two-level logic minimization is an important problem in 
logic synthesis I1UVSV06I . Different variants of minimization have been studied: The 
problem is I]2-complete for CNF formulas OUmaOU . NP-complete for Horn formu- 
las EvUl, and solvable in P for 2CNF formulas 0ChaO41 . 

In this paper we study the complexity of minimization for syntactically restricted 
formulas. Two frameworks for restricting the expressive power of propositional logic 
have been used for complexity classifications in recent years: 

• The Post framework ||Pos41|| considers formulas that instead of the usual op- 
erators A, V, and -1, use an arbitrary set B of Boolean functions as connectors. 
Depending on B, the resulting formulas may express only a subset of all Boolean 
functions, or may be able to express all functions more succinctly than the usual 
set {A, V, -•}. 

• The constraint framework IISch78l studies formulas in CNF form, where the 
types of allowed clauses (e.g., Horn, 3CNF, or XOR clauses) are defined in a 
constraint language F containing "templates" of generalized CNF-clauses that 
are allowed in so-called F-formulas. 

In both frameworks, a wide range of complexity classifications has been obtained. 
For the Post framework, we mention the complexity of satisfiability 0Lew79l . equiva- 
lence HReiOlL modal satisfiability MHSSIOL and non-monotonic logics MTV 10 1. In the 
constraint setting, besides the satisfiabiUty problem |Sch78, ABI+d9l . also enumera- 
tion of solutions 0CH97L equivalence and isomorphism IIBHRV02i IBHRV04ll . circum- 
scription PNJ04], and unique satisfiability fJub991 have been studied, see fCVOSl for a 
survey. The complexity of satisfiability for non-Boolean domains is also a very active 
field, see e.g., IIBul06ilB V08l . 

For many considered problems, "dichotomy results" were achieved, proving that 
every choice of i? or F leads to one of the same two complexity degrees, usually 
polynomial-time solvable and NP-complete. This is surprising since there are infinitely 
many sets B and F, and we know that there are, for example, infinitely many degrees 
of complexity between P and NP cases unless P = NP |iLad75j . 

A "Galois Connection" between constraint languages and closure properties in 
the Post setting determines the complexity of many computational problems PCG97i 
ISS08L In contrast, we show that these tools do not apply to minimization. 

In the Post setting, we obtain a complete classification of the tractable cases of the 
minimization problem: For a set B of Boolean functions, the problem to minimize 
i?-formulas is solvable in polynomial time or coNP-hard, hence avoiding the degrees 
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between P and coNP-completeness. Our results in this framework apply to both the 
formula and the circuit case, and to different notions of size of formulas and circuits. 

In the constraint case, we define irreducible constraint languages, among which we 
identify a large class whose formulas can be minimized in polynomial time, and prove 
NP- or coNP-hardness results for most of the remaining cases. More precisely, we 
prove the following: For an irreducible constraint language for which equivalence can 
be tested in polynomial time, the minimization problem is NP-complete if the language 
can express (dual) positive Horn, and can be solved in polynomial time otherwise. NP- 
completeness for the positive Horn case was shown in ||Bv94| . Our analysis thus shows 
that previous hardness results about the hardness of minimizing positive Horn formulas 
were "optimal:" As soon as a CNF fragment of propositional logic is strictly less ex- 
pressive than positive Horn, formulas can be minimized efficiently. Since irreducibility 
is a natural condition for constraint languages that are used in knowledge represen- 
tation, a consequence of our result is that knowledge bases that do not need the full 
expressive power of positive Horn admit efficient "compression algorithms." 

Our contribution is threefold: 

1 . We give new and non-trivial minimization algorithms for large classes of formu- 
las. 

2. In the Post setting, we prove that all remaining cases are coNP-hard. In the con- 
straint setting, we give strong evidence that larger classes do not have efficient 
minimization algorithms. 

3. We show that minimization behaves very differently than many other problems 
in the context of propositional formulas: The usually-applied algebraic tools 
for the constraint setting cannot be applied to minimization. Also, complexities 
in the Post- and constraint framework differ strongly: In particular, the con- 
straint framework contains NP-complete cases; such cases do not exist in the 
Post framework (unless NP = coNP). 

2 Minimization in the Post Framework 

We fix a finite set B of Boolean functions of finite arity. We define B-formulas in- 
ductively: A variable a; is a i?-formula, and if ipi, . . . ,Lpn are i?-formulas, and / is 
an n-ary function from B, then f{(pi, . . . , (fin) is a _B-formula. We often identify the 
function / and the symbol representing it. VAR(i^) denotes the set of variables in a 
formula ip. We write (p{xi, . . . , a;„) to indicate that VAR{(p) — {xi, . . . , Xn}- For 
an assignment a: VAR(</j) — 7> {0, 1}, the value of (p for a, (p{a), is defined in the 
straightforward way. We write a |= if (p{a) = 1, and say that a satisfies ip. For- 
mulas ipi and (p2 are equivalent if <pi{a) — <f2{ct) for all a, we then write ipi = <p2- 
The satisfiability problem for S-formulas, i.e., the problem to decide whether a given 
B-formula has at least one solution, is denoted with SAT(i?). 

Formulas can be succinctly represented as circuits, which are essentially DAGs 
where formulas are trees. Although every circuit can be rewritten into a formula, the 
size of the resulting formula can be exponential in the size of the circuit. 



3 



In the Post framework, we study two variations of the minimization problem that 
differ in the notion of the size of a formula ip. An obvious way to measure size is 
the number of occurrences of literals, which we denote with sizei{(p). The second 
measurement is motivated by the study of Boolean circuits, where the size of a circuit 
is usually the number of non-input gates. For a formula, this is the number of appearing 
function symbols. We denote this number with sizes{(p)- Our results also hold for 
obvious variations of these measures (e.g., counting variables instead of occurrences, 
also counting input gates, etc). For a set B as above, we define: 

Problem: MEE^/f {B) 

Input: A i?-formula/circuit cj) and a natural number k 

Question: Is there a i?-formula/circuit with sizei/g{ip) < k and (j) = ipl 

For an n-ary Boolean function /, the function dual(/) is defined as 
dual(/) (xi, . . . , Xn) — /(a^, • . • ,x^), i.e., dual(/) is the function obtained from 
/ by exchanging the roles of the values and 1 in the evaluation of /. Since the 
minimization problem is trivially invariant under this transformation, we obtain the 
following result (as usual, for a set B of Boolean functions, with dual(i?) we denote 
the set {dual(/) | / G B}): 

Proposition 2.1 Let B be a finite set of Boolean functions, then MEE^j^ {B) =^ 
MEEf/f (dual(B)) 

2.1 Tractable Cases: Polynomial-Time algorithms 

An n-ary Boolean function / is an OR-function if it is constant or if /(xi, . . . , is 
equivalent to x^ V V • • • V Xr„^ for a subset {xr^ , Xr2 , ■ ■ • , Xr„^} C {xi, . . . , 
AND- and XOR-functions are defined analogously. We show that formulas using only 
these functions can be minimized easily: 

Theorem 2.2 MEEfl^{B) can be solved in polynomial time if B contains only OR- 
functions, only AND-functions, or only XOR-functions. 

We mention that the theorem, as all of our results in this section, applies to all 
four combinations of F/C and s/l. We also stress that all algorithms in this paper do 
not only determine whether a formula with the given size restriction exists, but also 
compute a minimum equivalent formula. 

Proof. Let ★ denote the binary OR-operator if B C V, or the binary XOR-operator if 
B C L (the case E follows from Proposition 12.11 since dual(V) = E). We know that 
every element of B is of the form cq ★ ciXi * . . . c„a;„, where the Ci indicate which of 
the Xj is a relevant argument. Note that if B C V and cq — 1, then none of the argu- 
ments are relevant. Without loss of generality, assume that the first I of the variables 
are relevant. We then represent the function / with the tuple (c, l,n). We now show 
how building formulas from the functions in B can be expressed with arithmetic oper- 
ations on these tuples. Given two formulas representing the functions /i = (ci, ^i, ni) 
and /2 = (c2, fc2,rt2), we can, using the operations allowed in superposition, obtain 
formulas representing the following: 



4 



Substituting /2 for a relevant argument of /i 

(ci,Zi,ni) Orel {C2,l2,n2) = {ci-kC2,li + h - l,ni + n2 - 1) 
applicable iffh > 1 

Substituting /2 for an irrelevant argument of /i 

{ci,li,ni) Ojei (02,^2,^2) = (ci,/i,ni +n2 - 1) 
applicable ijfli < ni 

Identifying two relevant variables in /i in the case i? C V 

(ci,Zi,ni) -> (ci,/i - l,ni) 
applicable iffh > 2 

Identifying two relevant variables in /i in the case B C L 

{ci,li,ni) {ciji - 2,ni) 
applicable iffli > 2 

Note that identifying a relevant and an irrelevant variable comes down to simply 
renaming the irrelevant variable, and therefore is not of interest to us. 

We now describe the polynomial time algorithm. Assume that we are given a B- 
formula ip{xi ,Xn) and a natural number k. For the classes of functions that we are 
looking at, it is easy to determine which variables of the formula are relevant: In both 
cases, the i-th variable of ip is relevant if and only if 



Without loss of generaUty, assume that the relevant variables of 93 are exactly the vari- 
ables xi, . . . ,Xn- Note that since (p describes a function from B, we can again represent 
(f as {c^, l^, Uip) as above. The question if we can find a B-formula equivalent to 
with less then k variable occurrences is the same as the question if we can obtain, from 
the tuples representing the functions in B, a tuple of the form (c^ ,l^,n'), where n' > n 
and lip + n' < k (we can then, by renaming the n' irrelevant variables to xi+i ,Xn, 
construct the equivalent formula). 

It is obvious that if we have 0-ary constant functions in B, then we can remove ir- 
relevant variable occurrences from any i?-formula by replacing them with the constant 
functions, and hence (fi,k is a positive instance if and only if I + n < k. Therefore 
assume without loss of generality that all of the functions in B are of the form (c, /, n) 
for n > 1. In this case, the operations defined above are all non-decreasing in n. 
Hence we can simply generate a table containing all entries (c, I, n) for c G {0, 1} and 
l,n < max {{n' \ 3c', l'{c', I', n') G -B} U {n^}), where an entry is set to true if and 
only if a corresponding formula can be built from B. We start by setting all entries to 
true which correspond to functions in B, and then simply apply the operations defined 
above until no changes occur anymore in the table, then we check if an entry as re- 
quired is set to true. Since n is smaller than the input, the table is of polynomial size, 
and the procedure obviously can be performed in polynomial time. 

Note that this also gives a polynonoial time procedure if the set B is part of the 
input, if the functions in B are given using the formula representations (essentially, the 
tuples (c, I, n) in unary). □ 



^(0,. ..,0,0,0,... ,0)^^(0,. ..,0, 1,0,. ..,0). 
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2.2 Hardness Results: Relationship to Satisfiability 

The satisfiability problem for the formulas covered in Section lTTj can easily be solved 
in polynomial time. We now show that this is indeed a prerequisite for a tractable 
minimization problem — formally, we prove that the complement of the satisfiability 
problem (i.e., the set of all binary strings that are not positive instances of SAT(_B)) 
reduces to the minimization problem. 

Theorem 2.3 For every finite set B of Boolean functions, SAT(B) <J°s M EE,^^'^ [B). 

Proof. Without loss of generality, assume that there is an unsatisfiable B-formula 
(if such a formula does not exist, the result is trivial). We first state the reduction to 

For this, let k = sizei{'4)), and let be a B-formula. We first test whether there 
is an assignment that makes at most k variables true and satisfies (j). In this case, the 
reduction outputs a string that is not a positive instance of MEE^ [B). Otherwise, 
we produce the instance (0, k). 

The reduction can be performed in logarithmic space, since k is constant and the 
truth value of a formula can be determined in logarithmic space fBusS?]. We prove that 
it is correct: First assume that ij] is unsatisfiable. In that case, the reduction produces the 
string (0, k) which is a positive instance, since is equivalent to ip, and sizei{tp) = k. 

Now assume that is satisfiable. If there is an assignment that satisfies i/; and has 
at most k true variables, then the result of the reduction is not a positive instance of 
MEEf^'~^{B) by construction. Hence assume that this is not the case, then the result 
of the reduction is (cf), k). Assume indirectly that this is a positive instance. Then is 
equivalent to a formula or circuit x with at most k literals. Since </> is satisfiable, so is 
X- Since at most k literals appear in there is a satisfying assignment of x (and thus 
of (/)) that sets at most k variables to true, which is a contradiction. 

The reduction to MEEf-''^ is analogous: Let k — sizes{ip), and let n be the max- 
imal number of variables in a formula x with sizcsix) < k. Since there are only 
finitely many formulas with this size, n is a constant. The remainder of the proof is 
identical to the above case, where instead of k variables, we consider n variables: 

For an input formula (p, we first test whether there is an assignment that makes at 
most n variables true and satisfies 0. In this case, the reduction outputs a string that 
is not a positive instance of MEEf^^{B). Otherwise, we produce the instance {4>, k). 
Again, the reduction can be performed in logarithmic space. 

If (j) is unsatisfiable, the reduction produces (0, n) which is a positive instance 
as = ■(/;. Hence assume is satisfiable, and indirectly assume that the reduction 
produces a positive instance. In particular, cannot be satisfied with at most n variables 
set to true, and the result of the reduction is (</>, k). Hence there is a formula x with 
sizes{x) ^ ™d (f) = X- Since is satisfiable, so is x^ ™d since sizes (x) < k, 
we know that at most n variables appear in x- Hence x (and thus (j)) has a satisfying 
assignment with at most n variables set to true, a contradiction. □ 

Using results on the complexity of SKY{B) IILew79l . we obtain hardness results 
for a large class of sets B: 
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Corollary 2.4 Let B be a finite set of Boolean functions such that there is a B-formula 
that is equivalent to x Ay. Then MEE^^^ {B) is coNP-hard. 

Proof. This follows from Theorem |2.3| and the result shown in IILew791 . which proves 
that SAT(i?) is NP-complete for these choices of B. □ 

2.3 Hardness Results: Reducing from Equivalence 

The remaining cases are those where satisfiability is tractable, but which are not of the 
forms covered by Theorem 12.21 We show that in these cases, minimization is coNP- 
hard using a reduction from the equivalence problem for formulas, which asks to de- 
termine whether two given formulas are equivalent. We first need a technical lemma 
that will be used in our constructions later In the following, a variable x is relevant 
for a function / if the value of / is in fact influenced by the value of the variable, i.e., 
if there exist assignments a and a' such that a and a' agree on all variables except x, 
and /(a) 7^ f{(x'). Note that the size of the smallest formula is always at least as large 
as that of the smallest circuit, hence the following result covers the circuit case as well. 

Proposition 2.5 Let B be a finite set of Boolean functions such that B contains a 
function that it at least binary. Let m denote the maximal arity of a function in B, and 
let I > 1. Then for every B-circuit C with m ■ I relevant input variables, we have that 

sizes{C) >l + l 

Proof. This follows trivially since a connected B-circuit with / non-input gates can 
only connect m-l — {l~l)<ra-l input gates. □ 

The proof of the theorem below relies on the following idea: Given two formulas 
as input for the equivalence problem, we combine them into a single formula which 
is "trivial" if the formulas are equivalent, but "comphcated" otherwise. The "gap" 
between the cases is large enough to yield a reduction to the minimization problem. 

Theorem 2.6 Let B be a finite set of Boolean functions such that A G [B] and V G 
[B\J[1}]. Then MEE^{^(B) is coNP -hard. 

Proof. From Theorem 4.15 in OReiOll . we know that the problem of testing whether 
two given B-formulas are equivalent is coNP-complete. We show that this problem 
reduces to MEE^{^(B). Since V G [B] and A G [B U {1}], there are B-formulas 
/v(x, y, t) and /a (a;, y) such that ff\{x, y) = x A y, and f\/{x, y,l) = xW y. Let m 
denote the maximal arity of a function in B. We first consider MEEf^'^(_B). 
Let Hi and H2 be B-formulas, and define 

• I = sizes{f/\{Hi,t)), without loss of generality assume / > 1. 

• Z is a i?-formula equivalent to A"=i -^i for ^^'^ variables Zi. 

• G = A (/v (/a , ) , /a (/v {Hi ,H2,t),Z),t),t). 
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Note that I is polynomial in the input, since m is constant, and {Hi , t) clearly 
can be constructed. Hence the formula Z can be computed in polynomial time as well, 
since we can represent the conjunction over the Zi's as a tree of logarithmic depth, 
which grows only polynomially when repeatedly implementing A with //\. 

Also note that by construction, we have 

m-l 

G={HiA H2) V {{Hi VH2)a/\ Zi). 

1=1 

We claim that i?i = ffa if and only if (G J) G MEEf/'^(B). 

First assume that Hi = H2- In this case, G is equivalent to t A Hi, which is 
equivalent to {Hi , t), and thus there is a B-formula/circuit equivalent to G with size 
/ by definition of I. 

Now assume that Hi ^ H2- Then there is an assignment a that, without loss of 
generality, satisfies Hi but not H2- In this case, it easily follows that G[a] (i.e., G with 
the values for a hard-coded into the input gates, which is not necessarily a B-circuit 
anymore) is equivalent to t A A"=i ^i- Therefore, in this case all of the z,; are relevant 
variables for G. Therefore, Proposition |2.5l implies that for every i3-formula or circuit 
X equivalent to G we have sizes (x) > Z + 1. 

The proof for MEE; {B) is identical, except in this case we choose I as the size 
of literals in f/\{Hi,t), and consider a conjunction of / variables Zi. In the positive 
case, the equivalent formula f/\{Hi,t) has / literals, in the negative case, any formula 
equivalent to i A Ai=i needs to have at least I + 1 literals. □ 

We now show an analogous hardness result for the case that B can express the 
ternary majority function. Algebraically, this condition is equivalent to [B] containing 
exactly the Boolean functions / which are self-dual (i.e., dual(/) is the same func- 
tion as /) and monotone (i.e., if ai < f3i, . . . , a„ < /3„, then /(ai,...,a„) < 

/(/3l,...,/3n)). 

Theorem 2.7 Let B be a set of Boolean functions such that maj G [B], where 
maj{x, y, z) ^ 1 if and only if x + y + z > 2. Then MEE^^^, {B) is coNP-hard. 

Proof. We show that the equivalence problem for B-formulas, which is coNP- 
complete due to Theorem 4.17 of flReiOll . reduces to MEE^^'^(i?). Again, let m 
denote the maximal arity of a function in B. Since maj G [B], there is a i?-formula 
fmaj such that f{x, y, z) is equivalent to (x A y) V (a; A z) V (x A z). We note that 
fmaj{x, y,0) = X Ay, and fmaj{x, y, 1) = x V y. To increase readability, we also 
use the symbols E and V for fmaj when the last argument is assigned or 1, respec- 
tively. It follows that E{x, y,0) = x A y and V{x, y,l) = xM y. We first consider 
MEEf^'^(_B). Hence, let Hi and H2 be i?-formulas. We construct the following: 

• Let / sizes{V{f, E{Hi,H2, f), t)), where / and t are new variables. Then 
I > 1. 

• let E* be a formula with variables zi, . . . , z^.j, /, such that 

E*{zi,...,Zm.l.O)=K=lZr, 
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• let H be the formula 

V(V[f, E{H,,H2j), t),E{E{t, ViH,,H2,t),f), E*J), t). 

Obviously, I is polynomial in the input and the formula E* can be computed as 
follows: Construct the formula A™ Jzi as a A-tree of logarithmic depth, and substitute 
each A with its implementation using fmaj and /. Then the representation of E* can 
be computed in polynomial time. 

We claim that Hi = H2 if and only if {H, 1} e MEEf/f^{B). We consider aU 
possible truth assignments for / and t: 

ltt = f = 0, lhtnV{f.E{Hi,H2j),t)={),2.ndE{t,V{Hi.H2,t),f)=Q. 

\tt = f = l, then V{f, E{HuH2, f),t) = 1, and E{t, V{Hi,H2, t)J) = 1. 

If / = Oandi = 1, th&nV {f ,E{Hi,H2, f),t) = Hi A H2, 
2.ME{t,V{Hi,H2,t),f)=HiyH2. 

If / = 1 and i = 0, then V{f, E{Hi,H2, /), t) = HiV H2, 
SindE{t,V{Hi,H2,t),f) = Hi A H2. 

In all cases we obtain that if Hi = H2, then H = V{f, E{Hi,H2, f),t). 

First assume that Hi = H2- In this case, from the definition of I it follows that 
there is a B-formula equivalent to H with size at most I. 

Now assume that Hi ^ H2, then there is an assignment a such that, without loss 
of generality, a satisfies Hi and does not satisfy H2- We extend a with a{f) = and 
a{t) = 1. We then have that H[a] (again, this is H with the values for a hard-coded 
into it, which is not necessarily a i3-circuit) is equivalent to A"^[zi, and therefore every 
Zi is a relevant variable in H. Therefore, Proposition |2.5| implies that every S-circuit 
equivalent to H has size at most ^ + 1. 

The proof for MEE^ (B) is identical, except in this case we choose I as the num- 
ber of literals in V{f, E{Hi, H2, /), t), and consider a conjunction of / + 1 variables 
Zi. In the positive case, the equivalent formula V{f,E(Hi,H2,f),t) has I literals, 
in the negative case, any formula equivalent to AiiLi needs to have at least I + 1 
literals. □ 

2.4 Classification Theorem 

From the structure of Post's lattice IIPos4H (see ||BCRV03i for a summary), it follows 
that if B is a finite set of Boolean functions that contains functions /, g, and h such that 
/ that is not an OR-function, g is not an AND-function, and h is not an XOR-function, 
then one of the following is true: 

LAG [B] andV G [S U {!}], or 

2. maj G [B], where maj is the ternary majority function. 
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In both of these cases, the above two theorems imply that the minimization problem 
is coNP-hard. Hence, the problem is coNP-hard for all cases except those covered by 
our polynomial-time results in Section 12.11 We therefore obtain the following full 
classification: 

Corollary 2.8 Let B be a finite set of Boolean functions. 

• If B contains only OR-functions, only AND-functions, or only XOR-functions, 
then MEEf/7(B) can be solved in polynomial time. 

• Otherwise, MEEf^{'^ (B) is coNF -hard. 

3 Minimization in the CNF framework 

Constraint formulas are CNF-formulas, where the set of allowed types of clauses is 
defined in a constraint language F, which is a finite set of non-empty finitary Boolean 
relations. A F-clause is of the form R{xi, . . . , a;„), where R is an n-ary relation 
from F, and xi, . . . ,Xn are variables. A T-formula is a conjunction of F-clauses, it 
is satisfied by an assignment a, if for every clause R{xi, . . . , x„) in we have that 
(a(xi), . . . , a{xn)) € R. A relation R is expressed by a formula if the tuples in the 
relation are exactly the solutions of the formula (assuming a canonical order on the 
variables). We denote the satisfiability problem for F-fomiulas with SAT(F). We often 
identify a relation and the formula expressing it. For a constraint language F, we define 
the constraint language F, which is obtained from F by exchanging and 1 in every 
relation in F. This language is also called the dual of F. 

A natural way to measure the size of a CNF formula is the number of clauses — for 
a fixed language F, this is linearly related to the number of variable occurrences. We 
thus consider the following problem: 

Problem: MEE(F) 

Input: A F-formula ip, an integer k 

Question: Is there a F-formula ip with at most k clauses and ijj = Lpl 

First note that obviously, the duality between F and F directly results in these lan- 
guages leading to the same complexity: 

Proposition 3.1 Let V be a constraint language. Then MEE (F) MEE (F). 

To state our classification, we recall relevant properties of Boolean relations (for 
more background on these properties and how they relate to complexity classifications 
of constraint-related problems, see e.g., OCKSOll ). 

1. A relation is ajfine if it can be expressed by a {x, x, xi © • • • ® a;„, -i(a;i © • • • © 
Xn) \n ^ N}-formula. 

2. A relation is bijunctive if it can be expressed by a F^-formula, where F^ is the 
set of binary Boolean relations. 
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3. A relation is Horn if it can be expressed by a (xi A • ■ • A .t„ — !> 
y), (xi A • • • A a;„) | n G N}-formula. 

4. A relation is positive Horn if it can be expressed by a {xi A ■ • • A a;„ — > y | n € 
N}-formula. 

5. A relation is IHSB+ if it can be expressed by a {x, x,x ^ y, (xi V- • -Vxn) | n G 
N}-formula. 

A constraint language F is affine, bijunctive, IHSB+, or (positive) Horn if every rela- 
tion in r has this property. F is dual (positive) Horn if F is (positive) Horn, and IHSB— 
if F is 1HSB+. Note that 1HSB+ imphes dual Horn, and IHSB- implies Horn. Ad- 
ditionally, F is Schaefer if it is affine, bijunctive, Horn, or dual Horn. This property 
implies tractability of many problems for Boolean constraint languages, including sat- 
isfiability IISch78L equivalence OBHRV02II and enumeration MCH97I . For the latter two, 
the Schaefer property is necessary for tractability, unless P = NP. 

3.1 Irreducible Relations 

For many problems in the constraint context, it can be shown that if two constraint lan- 
guages Fi and F2 have the same "expressive power" (with regard to different notions 
of expressibility), then the problems for Fi and F2 have the same complexity. A lot 
of work has been done on categorizing relations with regard to their expressive power, 
which is related to certain algebraic closure properties of the involved relations. For a 
discussion of the relationship between different notions of expressiveness, see IISS08I . 
One of the strictest notions of "expressive power" is the following: We say that con- 
straint languages Fi and F2 have the same expressive power if and only if every relation 
in Fi can be expressed by a F2-formula and vice versa. This notion of expressiveness 
has been studied in IICKZ07L It arises naturally in many complexity considerations 
for constraint-related problems: If two constraint languages have the same expressive 
power, then one can easily show that formulas can be "translated" from one language 
to the other with little computational cost. Hence it is natural that the complexity for all 
computational problems where the answer remains invariant if input formulas are ex- 
changed for equivalent ones will then be the same — this includes satisfiability, equiva- 
lence, enumeration, and many other problems that have been considered. However, the 
minimization problem behaves differently: When translating formulas between differ- 
ent constraint languages, the number of clauses does not remain invariant. Moreover, 
even for constraint languages Fi and F2 with the same expressive power, expressing 
the same relation can possibly be done more efficiently using the language Fi than 
using F2. Therefore an easy proof showing that languages with the same expressive 
power lead to minimization problems with the same complexities cannot be expected. 
In fact, we show that the statement is not even true, by exhibiting constraint languages 
which have the same expressive power, yet having different complexities of the mini- 
mization problem. However, our complexity classification obtained later still heavily 
relies on the characterization of Boolean relations along the above lines. 
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Example 3.2 LetTi := {a; VyjandTa := {{xV y), {xV {y A z)), {xV {y A z Aw))}. 
Then obviously, every r2-formula can be rewritten as a Fi-formula and vice versa, 
using the equivalence y V {xi A • • • A a;„) = (y V xi) A ■ • • A (y V a;„). However, 
while the problem M EE (Fi) can obviously be solved in polynomial time, the problem 
MEE (r2) is NP-hard. This follows from a reduction similar in flavor to the one used 
in the proof of NP-hardness of MEE in OHW02L reducing from the Vertex Cover for 
cubic graphs problem: A cubic graph G = (V, £') has a vertex cover of size k if and 
only if the formula /\|^ j}e£; ^ ^'^ equivalent r2-formula with k clauses. 

Therefore, unlike all other problems in the constraint context that we mentioned, 
the complexity of the minimization problem is not determined by the expressive power 
of a constraint language. However, the problems that we need to solve in order to 
minimize r2 -formulas are combinatorial in nature, and do not stem from the difficulty 
of determining a "minimum representation" of what these formulas actually express. 
Therefore the NP-hardness is not related to the problem that we are interested in in 
minimization, namely to find a shortest equivalent formula, but from the difficulty of 
how to use the "building blocks" that we have efficiently. While this certainly is an 
interesting problem in its own right, in this paper we only study the complexity of the 
actual task of finding — not expressing — a minimum representation of a formula in the 
given constraint language. 

Looking at the example given above, the problems obviously arise from the fact 
that r2 contains "combined" relations which can be re-written into simpler clauses: the 
clause (a: V (y A z)) is equivalent to (a; V y) A (x V z). An important feature in the study 
of constraint satisfaction problems is that they allow us to build formulas from "local 
conditions," which are expressed in the individual clauses. The clause {x\J [y A z)) is 
in a way not "as local as it can be," since it can be rewritten as the conjunction of two 
"easier" conditions. We define irreducible relations as those that cannot be rewritten 
like this: 

Definition An n-ary relation R is irreducible, if for every formula 
Ri{x\ . . . , x\^) A ■•■ A . . . , a;^^) (where each Ri is a fc^-ary Boolean 

relation) which is equivalent to R{xi, . . . , x„), one of the i?i-clauses has arity at least 
n. A constraint language P is irreducible if every relation in P is. 

The intuition behind the definition is that a relation R is irreducible if the question 
if some tuple (ai , . . . , a„) belongs to R cannot be answered by checking independent 
conditions which each only depend on a proper subset of the values, but all of the ai 
have to be considered simultaneously. Note that the clause which has arity > n can be 
assumed to contain every variable of xi, . . . , Xn, since otherwise, a variable appears 
twice in the clause, which then could be rewritten with a relation of a smaller arity. 
Hence a relation is irreducible if and only if every formula equivalent to R{xi , . . . ,Xn) 
has a clause that contains (at least) all of the variables a:i , . . . , a;„. 

We mention that it is not sufficient to replace, in the above definition of irreducibil- 
ity, the "has arity at least n" with "has arity n." In this case, no relation would be 
irreducible at all, since i?(a;i , . . . , a;„ ) can always be expressed with the n + 1-ary term 
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R'{xi,...,Xn,Xn), where R' = {( xi, . . . j/) | (a;i,...,x„) G i? and x„ = y}. 
We thank the anonymous reviewer of OHSl II for pointing out this issue. 

IrreducibiHty is a rather natural condition — in fact, most relations usually consid- 
ered in the constraint context meet this definition: 

Example 3.3 1. Let R be expressed by a disjunction of n literals with n distinct 
variables. Then R is irreducible. 

Proof. Let R be expressed by [li y ■ ■ ■ \J In), where li is either Xi or Tl for n 
distinct variables xi, . . . , x„. Let Ri{x\ . . . ,x\ ) A • • • A Rm{x^ , . . . , x™ ) 
be a formula equivalent to R{xi, . . . ,x„). We need to show that there is one 
clause Ri{. . .) where each of the Xi appears. Let I be the assignment to the 
variables xi, . . . ,Xn such that I{xi) — 1 ifU is the literal Xi, and I{xi) ~ if 
li is the literal Xi. Then I does not satisfy the clause R{xi, . . . , Therefore 
there must be a clause Rj (x{ , . . . , cc^ , ) not satisfied by I. We show that each 
variable Xi appears in this clause, which then completes the proof. Let I' be 
the assignment agreeing with I for all variables except for Xi. Then I' satisfies 
R{xi , . . . ,Xn), and hence satisfies Rj {x^ , . . . , xj.,). Since I and I' only differ 

in the variable Xi, this variable must appear in Rj {x\ , . . . , . ). □ 

2. Let R be expressed by a clause xi®- ■ -Qxk = c for distinct variables xi, . . . ,Xk 
and a constant c G {0, 1}. Then R is irreducible. 

Proof Similar to the above: Let ip be a conjunction of clauses equivalent to 
R{xi, . . . ,Xk). Fix an assignment I not satisfying ip. Then there must be a clause 
in If not satisfied by I, but changing the truth value of any of the variables makes 
the formula ( and hence this clause) satisfied, therefore every variable appears in 
the clause. □ 

3. Every relation appearing in the base-list given in liCKZOTH is irreducible. 

The above list certainly is not exhaustive. Irreducible languages only allow 
"atomic" clauses that cannot be split up further In practice, for example in the design 
of knowledge bases, irreducible languages are more likely to be used: They provide 
users with atomic constructs as a basis from which more complex expressions can be 
built. We have seen above that there are languages with equal expressive power (mean- 
ing that formulas can be easily rewritten from one of the languages to the other), but 
the irreducible one has an easier minimization problem than the non-irreducible one. 
We do not expect that an example exists for the converse, where the problem is easier 
for the reducible case than for the irreducible, for the reason discussed above: In an in- 
formal way, when considering the minimization problem for irreducible languages, it 
is sufficient to find some minimum representation for the formula. The task to express 
this formula using the a minimum number of clauses of the given constraint language 
is easy. As seen in Example [32] in the case of reducible languages, this second task 
can be NP-hard in itself. 
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3.2 Tractable Cases: Polynomial-Time algorithms 



We now prove polynomial-time results for a wide class of constraint languages. In fact, 
we prove the maximum of polynomial time results that can be expected: As mentioned 
before, the MEE problem for positive Horn formulas is NP-complete IIBv94ll . We 
show the following result: For every irreducible constraint language that is Schaefer, 
and does not have all the expressive power of positive Horn (or dual positive Horn), the 
minimization problem can be solved efficiently. This proves polynomial-time results 
in each case where such a result can be expected, since for non-Schaefer languages, 
even testing equivalence is coNP-hard. Following well-known classification results 
about the structure of Boolean constraint languages, there are three cases to consider 
(ignoring the isomorphic cases arising to duality, see Proposition l3.1b : The case where 
r is affine, bijunctive, or 1HSB+. For each of these cases, we prove that the mini- 
mization problem can be solved efficiently. The most interesting and involved case 
is for constraint languages that are IHSB+. The bijunctive case is a simpler version 
of the IHSB+-case, the affine case is tractable due to the fact that formulas involving 
affine constraint languages can be seen as linear equations, for which there are efficient 
algorithms. 

3.2.1 IHSB+ and IHSB- formulas 

We start our polynomial-time results with the most involved of these constructions, 
proving that irreducible constraint languages that are IHSB+ lead to an easy mini- 
mization problem (from Proposition l3.1l it follows that the problem is polynomial-time 
solvable for IHSB— as well). We first prove the result for the basic case where the rela- 
tions in our constraint language are restricted to the ones "defining" IHSB+, and later 
prove that this case already is general enough to cover all irreducible languages that 
are IHSB+. Requiring irreducibility is necessary: The language r2 discussed in Ex- 
ample[32]is IHSB+ (in fact, considerably less expressive than IHSB+), but, as argued 
before, the minimization problem for r2 is NP-hard. 

The main idea of the algorithm is the following: We rewrite formulas using multi- 
ary OR, implication, equality, and literals into conjunctions of, to a large degree, inde- 
pendent formulas, each containing only OR, implications, equalities, or literals. Each 
of these formulas then can be minimized locally with relatively easy algorithms. The 
main task that our algorithm performs is "separating" the components of the input for- 
mula in such a way that minimizing the mentioned sub-formulas locally is equivalent 
to minimizing the entire formula. 

Theorem 3.4 Let T = {^,=,x,x} U {OR™ \m<k} for some k e K Then 

MEE(r) e P. 

Proof. We first introduce some notation and facts about F-formulas: For variables u 
and V, we write u v {u leads to v in ip) if there is a directed path consisting 
of and =-clauses in the formula from u to v. We often omit the formula and 
simply write u v. Similarly, if there are OR-clauses Ci — {xi V • • • V x„) and 
^^2 = (yi V • • • V Um), we write Ci C2 if every of the Xi leads to one of the i/j. It 
is obvious that in this case, the conjunction of Ci and the — ^ / =-clauses implies C2. 
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Note that since x x for all variables, it holds that (a;i V X2) {xi V X2 V X3). In 
particular, a literal x leads to a clause {xV y\/ z). 

It is easy to see that a F-formula is unsatisfiable if and only if there is some OR- 
clause (we regard literals as 1-ary OR-clause) xi V ■ • • V a;„ such that for each of the 
Xi, there is a variable Zi which occurs as a negative literal, and Xi ^ Zi (otherwise, 
we can satisfy the formula by setting all variables to 1 which do not imply negative 
literals). Since satisfiability for F-formulas can be tested in polynomial time IISch78l . 
we assume that all occurring formulas are satisfiable (otherwise in order to minimize 
we produce a minimum unsatisfiable F-formula, which is a fixed string). For any F- 
formula (p, let ipoR denote the formula obtained from by removing every clause that 
is not an OR-clause with at least 2 variables. Similarly let ip^ be the conjunction of all 
implication-clauses in ip, ipni the literals in (f, and ip= the equality clauses. 

We now describe the minimization procedure. We use some canonical way of or- 
dering variables and clauses (for example, the lexicographical ordering on the names) 
and repeat the following steps until no changes occur anymore; 

1: Input: F-formula 

2: while changes still occur do 

3: For a set of variables connected with =, only keep the minimal variable in non- 

equaUty clauses (by variable identification) 
4: if there exist OR-clauses Ci 7^ C2 with Ci C2, then 
5: If C2 ~^ Ci, then remove the minimal of the two 
6: Otherwise, remove C2 
7: end if 

8: if there is clause (a;i V • • • V a;„), variable v with Xi v for all i, then 

9: introduce clause v 
10: remove ^>-clauses leading to v 
11: end if 

12: if literal x occurs, x ^ y then 

13: replace final clause in path with y 

14: end if 

15: if literal y occurs, x y then 

16: replace first clause in path with x 

17: end if 

18: Remove variables occurring as negative literals from OR-clauses 
19: if {xi V • • • V Xn) is clause, Xi Xj for i ^ j then 
20: remove Xi from the clause 
21: end if 

22: if there are variables such that xi X2, ■ ■ ■ , Xn-i ~^ a;„, a;„ xi then 
23: exchange implications between them with equalities. 
24: end if 

25: if u (u) appears as a literal then 

26: remove clauses of the form {v — > u) ({u -> v)). 

27: end if 

28: Locally minimize ip^ and ip^. 
29: end while 
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Note that ip= and Lp\{i can be minimized trivially, and ip^ can be minimized due 
to a result from PAGU721, since finding a transitive reduction of a directed graph is 
exactly the problem of minimizing a formula in which only implications of positive 
literals appear. 

For a F-formula tp, let imx\{Lp) denote the result of this optimization procedure on 
input if. It is obvious that min((^) is equivalent to ip, and that the algorithm can be 
performed in polynomial time. It is also obvious that the number of clauses of min(</3) 
does not exceed the number of clauses of pi. This is clear from the definition of the 
algorithm except for step |9] In this case, the number of clauses could grow if there 
is no — ^-clause that we can remove. However, in this case, all of the variables in the 
OR-clause are =-connected with v, and therefore the clause is equivalent to v and can 
be removed. Therefore, the number of clauses in p does not increase when applying 
the algorithm. 

The main idea of the algorithm is that it brings the formulas in a "normal form," 
allowing us to minimize the components of the formula separately. The proof depends 
on the following claims: 

Fact 1 Let x <^ satisfiable T -formula such that min{x) = X '^'^d x implies x (x) for 
some variable x. Then xut implies x (x). 

Proof. First consider the case that x implies x. Then the formula x Ax is not satisfiable. 
Therefore, there is an OR-clause {xi V • • • V a;„) such that each Xi leads to a variable 
Zi which appears as a negative literal. No Xi can lead to a negative literal appearing 
in X, since such variables are removed from OR-clauses by the algorithm in steps [T6l 
and[l8] Hence every Xi leads to x. Therefore, x is present as a literal due to step|9]of 
the minimization algorithm. 

Now assume that x implies x. Then x A a; is not satisfiable. Hence there must be an 
OR-clause where every appearing variable leads to a variable occurring as a negative 
literal. Since x is satisfiable, this OR-clause must be the single literal x. Hence in 
X, X leads to a variable y occurring as a negative literal. By the construction of the 
minimization algorithm, x then also appears as a literal. □ 

The following facts are proven using similar arguments: 

Fact 2 Let xbe a satisfiable T -formula such that min{x) — X> '^^d letu,v be variables 
in F. such that x implies {u — > v). Then X-> A Xiit A x= implies {u — v). 

Proof. Assume that this is not the case. It then follows that x A u A TJ is not satisfiable, 
and x^ A xut A x= A u A U is. Since the former is unsatisfiable, there is an OR-clause 
{xi W ■ ■ ■ V Xn) such that every Xi leads to a variable occurring as a negative literal 
in X A It A iJ. First assume that this OR-clause is the literal u. If u would lead to a 
variable occurring as a negative literal which is not the variable v, then by construction 
of the algorithm, u would be a literal in x, a contradiction, since xut A u is satisfiable. 
Therefore, we have that u v, and the claim follows. 

Now assume that the OR-clause is not the variable u. Since variables leading to 
negative literals are removed from OR-clauses by the minimization algorithm, all of 
the Xi lead to v. By construction of the algorithm, a literal v is then introduced in x- 
This is a contradiction, since x-i- A xut A x= A u A w is satisfiable. □ 
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Fact 3 Let x be a satisfiable T -formula such that min(x) — X> X implies x ^ y, and 
X does not imply x or x. Then x= implies x = y. 

Proof. From Fact|2] we know that xiit A ^ X= implies (u — v) and {v u). 
Therefore, since none of these variables appear as literals, we know that u v and 
V u. Therefore, =-clauses connecting u and v are introduced by the algorithm. □ 

After establishing these initial facts about the algorithm, we now prove that it 
is correct, i.e., that min((^) has a minimal number of clauses among all F-formulas 
equivalent to ip. To prove this, let ip a. formula such that ^ = (p. We show that 
|min((^)| < Since |min('0)| < \tjj\, it suffices to show that |min(c/?)| < |min(-!/))|. 
Hence it suffices to prove that for equivalent formulas ip and V' such that mm{ip) = ip 
and min(i/)) = tfj, it follows that \lp\ < 

The main strategy of the remainder of the proof is to show that the above algorithm 
performs a "separation" of the formula in components containing the "— >-part," the 
"literal part" and the "=-part," which is, in a sense, uniquely determined: For the two 
equivalent formulas t/j and (p, the obtained parts are not necessarily identical, but they 
are equivalent. This is the main reason why it is sufficient to only minimize these 
"components" in our algorithm. 

Fact 4 m/n('0)_> = m;n((^)_>. 

Proof. Let {u — > w) be a clause in min('i/')^. Then we know that ip does not imply 
one of u,u,v,v: Assume that this is the case. From Fact[T] we then know that the 
corresponding literal appears in the formula itself. In the case that u appears, the clause 
(m —i' v) would have been deleted by the algorithm, and replaced with the literal v. In 
the case that v appears, the clause is replaced with u. If u occurs, or v occurs, then 
the clause (u v) is tautological and has been removed by the algorithm in step|26l 
From Fact 12] we know that (since min{p), mm{ip), p, and ijj are all equivalent), that 
flit A p^ A p= implies {u — ?> v). Due to the above, since u and v do not appear as 
literals and only one variable for each connected =-component appears in the rest of 
the formula, we know that p^ implies {u ^ v). Therefore, p^ implies every clause 
in '0_)., and hence ip^ implies ip^. Due to symmetry, it follows that these formulas are 
equivalent. □ 

The following claims follow in a similar way: 

Facts min{ip)ii, = min{p))ij,. 

Proof. This is obvious from FactfT] since literals appear as literal clauses if and only if 
they are implied by the formula, and the formulas are equivalent. □ 

Fact 6 min{'tjj)= = min{p)^ 

Proof. We know from Fact[T]that a variable x such that ip implies xoix appears as a 
positive or negative literal, and a variable appearing as a literal does not appear in 
Let {u — v)he a clause in -0=. Then p implies {u v) and {v ^ u). Since both do 
not appear as literals, Fact|2]then implies that u v and v u. Therefore, equality 
clauses between them have been introduced in p, and hence p>^ implies {u — v). Thus 
p^= implies ip^, and due to symmetry, they are equivalent. □ 
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It remains to deal with the OR-components: We want to show that ipoR and ipoR 
are equivalent as well. To show this requires a bit more work. Let C be the set of OR- 
clauses which follow from ip, and which only contain variables occurring in ipoj^ (note 
that we do not have to construct this (potentially exponential) set in the algorithm). 

Fact 7 Let C be a -^-minimal clause in C. Then C appears in min((^) and in m\T\{ip). 

Proof. Let C ~ [xi V • • • V a;„). Since ip implies C, we know that A afl A • • • A 5?^ 
is unsatisfiable. Due to the remarks at the beginning of the proof, this means that there 
is a clause B = (yi V • • • V j/™) such that each of the yi leads (in p) to a variable 
occurring as a negative literal. Since variables leading to negative literals are removed 
from OR-clauses by the algorithm, we know that each of the yi leads to one of the Xj . 
Therefore, B C. Since C is ^-minimal, we know that C B holds as well. 

It remains to show that B and C contain the same variables. Assume that there is 
some variable Xi which does not appear in B. Since B C and C B, we know 
that Xi leads to some variable yj, and that yj leads to some Xk, which in turn leads to 
some yi. Since is transitive, it follows that yj yi- If yj and yi would be different 
variables, then yj would have been removed from B by the algorithm. Therefore we 
know that yj and yi are the same variables. Since yi Xk yi, we know that p 
implies {yi Xk) and {xk — > yi), and hence p implies Xk — yi- Since these variables 
appear in <foR, we know by construction that none of them appears as a literal, and 
thus, from Fact[T] know that neither Xk,yi,'Xk or yJ aie impUed by ip. From Fact[3] we 
therefore know that (p^ implies Xk — yi- By construction, only the lexicographically 
minimal of these two variables appears in tpoK, and since both appear, it follows that 
they are the same variable. This is a contradiction to the assumption that Xi does not 
appear in B. Similarly, we can show that every variable from B appears in C. 

Since ifj — min('0), and (p and ij} are equivalent, the same argument can be used to 
show that the clause appears in t/i. □ 

We now show the converse of the above fact: 

Fact 8 Let C be a clause appearing in min{ip). Then C is minimal in C with respect to 

Proof. Assume that this is not the case. Since C is a finite set, this implies that there is 
a minimal clause B inC such that B C. Due to Fact|7] we know that B appears in 
tpoR- Since B C and C -/^ B (since C is not minimal), C is removed from ip by the 
algorithm, a contradiction. □ 

Therefore we know that the clauses appearing in ipo^ are exactly the minimal 
clauses in C, and from Fact|7] we know that each of these also appears in a ^or- 
Therefore, the number of OR-clauses in po^ is bounded by the number of OR-clauses 
in ■i/'oR- Due to symmetry, they are equal. Since the components containing literals, 
equalities and implications have been minimized independently, the number of clauses 
in p and il) is equal, which concludes the proof of Theorem l3.4l □ 
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A careful analysis of the proof yields that it also holds true if F does not contain all 
the relations defining 1HSB+, even though in these cases, only a restricted vocabulary 
is available for the minimum formula. 

Corollary 3.5 LetT C {^,=,x,x} U {OR™ | m G M} for some finite set M C N. 
Then MEE(r) e P. 

Proof. This follows from the proof of the previous Theorem 13.41 If a relation is not 
present in the input formula, it is not introduced (note that positive literals can be writ- 
ten as OR-clauses), except for the case of the equality relation. Simply write this as 
two implication clauses, and apply the proof of the above theorem, where implications 
replace equalities. Note that the algorithm reduces OR-clauses in arity, therefore po- 
tentially resulting in a clause that cannot be expressed by the constraint language T. 
However, we can simply use an OR-clause of higher arity with multiple appearances 
of variables. □ 

The previous two results covered the case that the constraint language F contains 
only the relations that define IHSB+. We will now show that irreducible relations that 
are 1HSB+ are very close to these "base relations" in Lemma l376l This lemma is used 
in the proof of our main result on IHSB languages. Corollary 13.71 which shows that 
our algorithm cannot only be applied to the cases directly covered by Theorem l3.4l but 
by every UTeducible constraint language that is 1HSB+ or IHSB—. For this, we need 
some additional notation: We say that a relation i? is a permutation of a relation S if 
R{xi, . . . , Xn) is equivalent to 5'(a;n(i), ■ . ■ , a;n(ri)) for some permutation 11 on the set 
{l,...,n}. 

Lemma 3.6 Let F — {x,x,^,—,OR"^ \ m e N}. Then every irreducible relation 
which is IHSB+ is a permutation of an element ofT. 

Proof. Note that by definition, a relation is IHSB+ if and only if it can be expressed 
by a F-formula (equality can be expressed as two implications). Let i? be a relation 
that can be expressed with a F-formula, and let n be its arity. By choice of R, there 
is a formula (p — Ri{x\ . . . , x^^) A • • • A Rm{x''", . . . , x™ ) which is equivalent to 
R{xi, . . . ,Xn), where each x* is an element of {xi, . . . ,Xn}, and Ri e F. Without 
loss of generality, we assume that no clause in (p can be removed without changing the 
represented relation, and that no variable appears twice in an OR-clause, and that no 
variable can be removed from an OR-clause without changing the represented relation. 

Since R is irreducible, there is a clause C in which every variable appears. First 
assume that this clause is an OR™-clause, hence C = {xi V ■ • • V a;„). If no other 
clause appears in ip, then R is the n-ary OR-relation, and hence an element of F, as 
required. Therefore assume that there is another clause C" in (p. Due to the minimality 
of (p, C is not equivalent to C. If C is an OR-clause, then C contains a proper subset 
of the variables occurring in C (since in C, all variables occur), and hence the clause C 
is redundant, a contradiction to the minimality of ip (note that this also covers the case 
where C" is a positive literal). If C" is a negative literal Tl, then Xi can be removed 
from the clause C, a contradiction to the minimality. Therefore assume that C is an 
implication, C = {xi — > xj). Since there are no superfluous clauses in tp, we know 
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that Xi and Xj are different variables. Then the variable Xi can be removed from the 
OR-clause C without changing the represented relation, a contradiction. 

Now assume that C is not an OR-clause, hence C is a literal or an implication. In 
particular, the arity of R is at most 2. If i? is a 1-ary relation, then R obviously is 
irreducible. Hence assume that R is one of the 16 binary Boolean relations. We make 
a complete case distinction. The empty relation cannot be an element of a constraint 
language by definition. If R only contains a single element, it can be written as a 
conjunction of literals and therefore is not irreducible. If R is the full binary relation 
over the Boolean domain, it can be written as T(xi) A T{x2), where T is the 1-ary 
relation {(0), (1)}, and hence is not irreducible. It remains to consider the cases where 
R has exactly 2 or exactly 3 elements. 

The relation {(0, 0), (0, 1)} is not irreducible, since it can be written as 'xif\~V{x2)- 
Similarly, {(0, 0), (1, 0)} is represented by T(a;i) A x^. The relation {(0, 0), (1, 1)} 
is the equality relation and an element of F, the relation {(0, 1), (1, 0)} is not 
IHSB+, {(0, 1), (1, 1)} is not irreducible (it can be written as T(a;i) A X2), similarly 
{(1, 0), (1, 1)} can be written as xi A T(a;2)- 

Now consider the relations with exactly three elements: {(0,0), (0, 1), (l,0)}isthe 
binary NAND and therefore not IHSB+, {(0, 0), (0, 1), (1, 1)} is the implication and 
therefore an element of F, {(0, 0), (1, 0), (1, 1)} is a permutation of the implication, 
and {(1, 0), (0, 1), (1, 1)} is the binary OR and hence an element of F. □ 

The previous two theorems and Proposition ^. 1 I directlv imply the following corol- 
lary, which as mentioned is our main result for IHSB+/IHSB— constraint languages: 

Corollary 3.7 Let F be an irreducible constraint language which is IHSB+ or IHSB . 
Then MEE(F) e P. 

3.2.2 Bijunctive Formulas 

We now cover the final of our polynomial-time cases, which covers constraint lan- 
guages which are bijunctive. Note that this is not the same as only showing that gen- 
eral 2CNF formulas have an efficient minimization procedure (which was shown in 
fCha04l): In addition to being able to minimize arbitrary 2CNF, we also need to be 
careful about only using those relations that are present in the constraint language. 
Again, Example 13 . 21 shows that the prerequisite that F is irreducible is necessary. 

Theorem 3.8 Let T be a constraint language which is irreducible and bijunctive. Then 
MEE(F) e P. 

Proof. Since F is bijunctive, every relation in F can be written as a formula using only 
at most binary relations. Since F is also irreducible, this implies that every relation in 
F is at most binary. The only irreducible binary and unary relations over the Boolean 
domain are (up to permutation of the variables) the literals x and x, the binary OR, 
binary NAND, implication, equality, and exclusive OR. Since all of these relations can 
be written as implications between literals, minimization can be performed analogously 
to the proof of the previous Theorem 13 .41 □ 
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3.2.3 Affine Formulas 



We conclude our polynomial-time results with the affine case. Affine formulas repre- 
sent linear equations over GF (2). We therefore can apply linear algebra techniques 
to obtain an efficient minimization algorithm. Results for linear equations have been 
obtained before IICur84l Section 8]. We show here that the result covers all cases where 
the language is affine and irreducible. 

Theorem 3.9 Let T be an irreducible and affine constraint language. TTzen MEE (F) G 
P. 

Proof. Let be a F-formula. Since satisfiability testing for affine formulas can be done 

in polynomial time, we can without loss of generality assume that ip is satisfiable. Since 

equivalence for affine formulas can be checked in polynomial time, we can compute 

a formula which is equivalent to ip and irredundant in the sense that if we remove a 

clause, it is not equivalent to ip anymore. Therefore, it suffices to prove that such an 

irredundant formula already is minimum. Note that a minimum formula obviously is 

irredundant. We therefore show that two affine formulas Lpi and ip2 with |VAR((^i)| = 

|VAR(iy92)| which are both irredundant and are equivalent, have the same number of 

clauses. In order to do this, we show that a satisfiable, irredundant formula ip over n 

variables with k clauses has exactly 2""*^ solutions. Let the clauses be Ci, . . . , Ck- 

Since every relation in F is irreducible, each clause is of the form 

for variables , . . . ,xf and a constant c e {0, 1}. This can equivalently be written as 

Xi = ^{x2' (B- ■ -(Bxf (Bc'^). Since is irredundant, we know that no clause Ci follows 

from the clauses Ci, . . . , Ci-i. Therefore, each clause restricts the possibilities of the 

values of , and therefore the relation R represented by Ci A • • ■ A is a proper 

subset of the relation R' represented by Ci A • ■ • A Ci-i. Since these relations are 

represented by affine formulas, their cardinalities are powers of 2. Therefore, \R\ < 

\r'\ \r'\ 
Since only one variable is restricted in the clause Ci, it follows that \R\ — 

as claimed. □ 
3.3 Hardness Results 

As mentioned before, our polynomial-time results cover all cases where polynomial- 
time algorithms can be expected. We now prove hardness results for most of the re- 
maining cases. 

3.3.1 Minimization and Satisfiability 

For unrestricted propositional formulas, the MEE problem is obviously coNP-hard, 
since a formula p is unsatisfiable if and only if the all-l-assignment does not satisfy 
it, and it has a minimum equivalent expression of size (which then only can be the 
constant 0). The following result uses the same idea of reducing the complement of 
the satisfiability problem to the minimization problem — however, since the constant 
is usually not available in our constraint languages and we are minimizing the number 
of clauses, the proof is a bit more involved, while still following the same pattern. 
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Theorem 3.10 Let T be a finite constraint language. Then SAT (T) <f„ MEE(r). 

Proof. Let lySmin be an unsatisfiable F-formula with a minimal number of clauses. Let 
/crnin be the number of clauses in i^min- 

The reduction works as follows: Let be a F-formula. First compute the set M 
containing of all F-formulas containing at most k^in clauses with variables appearing 
in (f. Note that, since F is a finite constraint language, this is a polynomial set, and each 
formula in AI has a number of appearing variables bounded by a constant. Therefore 
we can, in polynomial time, construct for each ifj G M the set of all solutions of ip. 
For such a solution, let J™' be the assignment which agrees with for all variables 
appearing in ip, and assigns to all other variables. 

For each assignment I^"', check if it is a solution of (p. If this is the case, let {(p' , k') 
be a negative instance of MEE (F). Otherwise, let (1^9', k') :— {ip, fcmh-,). We show that 
(fi is unsatisfiable if and only if {ip' , k') e MEE (F). 

First assume that (p is unsatisfiable. In particular, in this case it holds that {ip', k') = 
i'P, fcrain)- We can, without loss of generality, assume that ip^in contains at most one 
variable, since formulas obtained from unsatisfiable formulas via variable identification 
remain unsatisfiable. In particular, we can assume that in (pmin, only variables from ip 
appear. Since (p is unsatisfiable, ip is equivalent to (pmm, and hence {(p' , k') G MEE (F). 

Now assume that ip is satisfiable, and assume indirectly that {(p' , k') G MEE (F). 
By choice of {(f\k'), this implies that {(p',k') — (93, fcmin)- Since M contains all 
F-formulas with at most fc^in clauses, it follows that ip is equivalent to some formula 
G F. Since ip is satisfiable, so is ip. Therefore there is some such that satisfies 
Ip. Since ip and ip are equivalent, it follows that /™' satisfies (p. This is a contradiction, 
since in this case, the reduction does not produce the instance {ip, fc,nin)- Q 

If a constraint language F is not Schaefer (i.e., neither Horn, dual Horn, bijunctive, 
nor affine), then the satisfiability problem for F+ = F U {x,x} (F extended with the 
possibility to express literals) is NP-complete. The previous theorem therefore yields 
the following corollary: 

Corollary 3.11 Let T be a constraint language that is not Schaefer Then MEE (r+) 
is coNP-hard. 

3.3.2 NP-completeness Results 

In this section we consider the MEE problem for irreducible constraint languages that 
are Horn, but not IHSB— . We show that for these languages, the MEE problem is NP- 
complete. This shows that the algorithm we developed in the previous section for the 
IHSB+/IHSB — case cannot be modified to work with larger classes of formulas (re- 
member that IHSB— formulas are a subset of Horn formulas). Due to Proposition 13. II 
the analogous result is true for dual Horn and IHSB+. We first prove a result about 
what the irreducible relations here look like: 

Theorem 3.12 Let F be an irreducible constraint language such that F is Horn, but not 
IHSB . Then there is a relation R E T which can be expressed by xi A ■ ■ ■ A Xk y, 
for k>2. 
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Proof. Let THom := {NAND'', {xi ^ ■ ■ ■ ^Xk ^ y) \ k ^n). Since T is Horn, it 
follows from iCKZ07ll that every relation in F can be written as a rHom-formula. We 
show that every relation in V with an arity of n > 3 is an element of FHom- Therefore, 
let R be such a relation, and let <p be a FHom-formula representing R, i.e., a formula 
equivalent to R{xi , . . . , x„), and assume that (p is minimal in the sense that no clause 
can be deleted, and no variable can be removed from a clause without changing the 
relation expressed by the formula. 

Since R is irreducible, there is a clause C in Lp such that every variable xi, . . . ,Xn 
appears in C. If C is the only clause in ip, then it follows that R G FHom as claimed. 
Hence assume that there is another clause C" in p. The variables of C" then must be a 
subset of the variables in C. We make a case distinction. 

First assume that both clauses are NAND-clauses. If C and C contain the same 
variables, then C and C" are equivalent, a contradiction. Therefore the variables ap- 
pearing in C" are a proper subset of the variables from C. Hence C" implies C, and C 
can be removed from p without changing the represented relation, a contradiction. 

Now assume that C is a NAND-clause and C is of the form {xi^ A • • • A Xi^. — >■ 
Xi ). Then the variable xi can be removed from the clause C, a contradiction to the 
minimality of p. 

Assume that C is (without loss of generality) of the form (xi A • • ■ A Xn-i Xn), 
and C of the form {xi^ A • • • A . . . Xi^, — >■ Xj). If Xj and Xn are the same variable, then 
C implies C, and C can be removed from p without changing the relation expressed 
by the formula, a contradiction to the minimality of p. Hence assume that Xj is one of 
the variables xi, . . . , Xn-i- Then the clause (.ti A • • • A Xn) can be replaced 

with (a;i A • • • A Xj-i A Xj+i A ■ ■ ■ A Xn-i — > Xn), a contradiction to the minimality 
of ip. 

Finally assume that C is of the form (a;i A • • • A Xn-i ~^ Xn), and C is a NAND- 
clause, let C" = NAND(xij , . . . , Xi^). First assume that the variables in C" contain 
the variable x„, without loss of generality assume that ii — n. We prove that C can 
be replaced by the clause C" = {xi V • • ■ V a:„_i), which is a contradiction to the 
minimality of p. Therefore, let / be an assignment satisfying C and C", and indirectly 
assume that / ^ C". Then I{xi) = • • ■ = I{xn-i) = 1- Since / ^ C, it follows that 
I{xn) — 1, and hence / does not satisfy C, a contradiction. For the other dkection, 
it is obvious that C" implies C. Therefore it remains to consider the case that the 
variables in C do not contain the variable a;„. In this case it is obvious that C implies 
C, and hence C can be removed from ip, a contradiction. 

We therefore have proven that every element of F of arity at least 3 is an element of 
THorn- In order to prove the theorem, assume that F does not contain a relation of the 
required form. Then every relation in F is either at most binary, or a NAND-relation of 
some arity. The only non-empty binary relations (up to permutation) over the Boolean 
domain which are Horn and irreducible are the following: 

• Conjunctions of literals, 

• the full relation, 

• implication. 
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• X AT{y),x AT{y), 

• equality, 

• binary NAND. 

(the exclusive-OR relation and the binary OR are not invariant under conjunction, 
and therefore not Horn). Therefore, F can only contain NANDs and the relations in the 
list above, this implies that F is IHSB— , a contradiction. □ 

The previous theorem shows that every constraint language that is Horn but not 
IHSB— contains a clause which follows the same pattern as the clauses defining pos- 
itive Horn. Hence it is not surprising that the proof of the main result of ||Bv94| can 
also be used to show the following: 

Theorem 3.13 Let F be an irreducible constraint language such that F is Horn, but 
not IHSB-. Then MEE (F) is NP-complete. 

Proof. The problem is in NP, since equivalence testing for Horn formulas can be 
performed in polynomial time P BHRV021 . NP-hardness follows using techniques 
from ||Bv94| : One of the main results of that paper is that there exists a reduction / 
from the well-known NP-complete Hamiltonian path problem to a DNF minimization 
problem which has the following properties. We say that a formula is a pure-Horn-3- 
DNF if is is a disjunction of clauses, and each clause is a conjunction of 2 or 3 Hterals, 
with exactly one negative hteral. 

1. For each graph G with m edges, the formula f{G) is a pure-Horn- 3-DNF 

2. If G has a Hamiltonian path, then there is a pure-Horn- 3-DNF containing m + 2 
clauses equivalent to f{G), 

3. If G does not have a Hamiltonian path, then there is no DNF containing at most 
771 + 2 clauses equivalent to /(G). 

We describe the obvious procedure to use this result as a proof of the hardness 
result for MEE (F). It is obvious that the negation of a pure-Horn-3-DNF formula ip 
can be written as a F-formula CNF((/7), as the conjunction of the following clauses; 

• For a clause xAyiiKp, introduce a clause {x y), 

• For a clause a; A y A z in introduce a clause (a; A y — > z). 

These clauses can obviously be constructed using the relation {xi A ■ ■ ■ Axk y) 
and variable identification. We claim that G has a Hamiltonian path if and only if 
(CNF(/(G)), 777 + 2) is a positive instance of MEE (F). 

First assume that G has a Hamiltonian path. Then, due to the above, /(G) has 
an equivalent pure-Hom-3-DNF formula ij] with at most m + 2 clauses. Since t/j is 
equivalent to /(G), it follows that CNF(V') is equivalent to CNF(/(G)), and also hat 
at most m + 2-clauses. For the other direction, assume that there is a F-formula ip with 
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at most m + 2 clauses which is equivalent to CNF(/(G')). From the case distinction 
in the proof of Theorem 13. 121 it is easy to see that every relation in F can be written as 
a disjunction of literals. Therefore, the negation of i/' is a DNF-formula with at most 
m + 2 clauses which is equivalent to /(G). From the above, it follows that G has a 
Hamiltonian path, which completes the proof of the theorem. □ 

The NP-hardness result for constraint languages dealing with Horn logics now fol- 
lows as a corollary: 

Corollary 3.14 Let T be an irreducible constraint language that is Schaefer, not affine, 
not bijunctive, not IHSB+, and not IHSB—. Then MEE (F) is NP-complete. 

Proof. From the well-known classification of constraint languages with respect to their 
expressive power, it follows that F is either Horn and not IHSB—, or dual Horn and 
not IHSB+. For the first case, the result follows from the theorems in this section, 
Proposition ^, ll then implies the result for the dual Horn case. □ 

3.4 Classification Theorem 

We can now state our main classification theorem — it follows from the results in the 
previous sections, and the fact that, by definition, a constraint language which is not 
affine, bijunctive, IHSB+, IHSB—, Horn, or dual Horn, is not Schaefer 

Theorem 3.15 Let F be an irreducible constraint language. 

L IfV is affine, bijunctive, IHSB+, or IHSB-, then MEE (F) e P. 

2. Otherwise, ifT is Horn or dual Horn, then MEE (F) is NP-complete, 

3. Otherwise, F is not Schaefer, and MEE (F+) is coNP-hard. 

While the theorem does not completely classify the complexity of the MEE prob- 
lem for all irreducible constraint languages, we consider it unlikely that there exist more 
polynomial-time cases than the ones we discovered: To the best of our knowledge, no 
decision problem for non-Schaefer constraint languages has been proven to be in poly- 
nomial time except for trivial cases (satisfiability of F-formulas can be tested in poly- 
nomial time if every relation from F contains the all-0 or all-l-tuple). Also, for these 
languages F, already testing equivalence of formulas is coNP-hard. This implies that, 
unless P = NP, there cannot be a polynomial-time algorithm that, given a F-formula, 
computes its "canonical" (i.e., up to differences checkable by a polynomial-time algo- 
rithm) minimum equivalent expression (this would immediately solve the equivalence 
problem in polynomial time). We are therefore confident that our classification covers 
all polynomial-time cases for irreducible constraint languages. 

It is worth noting that the prerequisite that F is irreducible is certainly required for 
the polynomial-time cases, as the earlier example highlighted. For the hardness results, 
this is less clear — the coNP-hardness does not rely on this prerequisite at all, and for 
the NP-complete Horn cases, we consider it unlikely that there is a constraint language 
with the same expressive power that does not directly encode positive Horn. 
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4 Conclusion and Open Questions 



We have studied the complexity of the minimization problem for restricted classes 
of propositional formulas in two settings, obtained a complete characterization of all 
tractable cases in the Post case, and a large class of tractable cases in the constraint 
case. 

Open questions include the exact classification of the coNP-hard cases. It is likely 
that most of them are NP-hard as well. It would be very interesting to determine 
whether some of these are actually I]2"Complete (this does not follow directly from the 
S2 -completeness of the minimization problem for CNF formulas llUmaOll . since our 
constraint languages F and bases B are finite). 

Finally, it would be very interesting to understand how non-irreducibility influences 
the complexity. 
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